202211.00322v1 


chinaXiv 


ChinaXivA (ERAT! 
RESEARCH PAPER 


OpenKG Chain: A Blockchain Infrastructure for 
Open Knowledge Graphs 


Huajun Chen", Ning Hu?, Guilin Qi?, Haofen Wang", Zhen Bi', Jie Li? & Fan Yang" 


‘College of Computer Science & AZFT Knowledge Engine Lab, Zhejiang University, Hangzhou 310058, China 
2Ontology Foundation, Venture Drive #11-31, Vision Exchange, Singapore 608526, Singapore 
*School of Computer Science and Engineering, Southeast University, Nanjing 211189, China 


‘Intelligent Big Data Visualization Lab, Tongji University, Shanghai 200092, China 


Keywords: Open knowledge graph; Blockchain; Decentralized distributed network; Decentralized knowledge 
graph 


Citation: Chen, H.J., et al: OpenKG chain: A blockchain infrastructure for Open Knowledge Graphs. Data Intelligence 3(2), 
205-227 (2021). doi: 10.1162/dint_a_00095 
Received: January 10, 2021; Revised: March 15, 2021; Accepted: April 25, 2021 


ABSTRACT 


The early concept of knowledge graph originates from the idea of the semantic Web, which aims at using 
structured graphs to model the knowledge of the world and record the relationships that exist between things. 
Currently publishing knowledge bases as open data on the Web has gained significant attention. In China, 
Chinese Information Processing Society of China (CIPS) launched the OpenKG in 2015 to foster the 
development of Chinese Open Knowledge Graphs. Unlike existing open knowledge-based programs, 
OpenkG chain is envisioned as a blockchain-based open knowledge infrastructure. This article introduces 
the first attempt at the implementation of sharing knowledge graphs on OpenKG chain, a blockchain-based 
trust network. We have completed the test of the underlying blockchain platform, and the on-chain test of 
OpenKCG'’s data set and tool set sharing as well as fine-grained knowledge crowdsourcing at the triple level. 
We have also proposed novel definitions: K-Point and OpenKG Token, which can be considered to be a 
measurement of knowledge value and user value. 1,033 knowledge contributors have been involved in two 
months of testing on the blockchain, and the cumulative number of on-chain recordings triggered by real 
knowledge consumers has reached 550,000 with an average daily peak value of more than 10,000. For the 
first time, we have tested and realized on-chain sharing of knowledge at entity/triple granularity level. At 
present, all operations on the data sets and tool sets at OpenKG.CN, as well as the triplets at OpenBase, are 
recorded on the chain, and corresponding value will also be generated and assigned in a trusted mode. Via 
this effort, OpenKG chain looks forward to providing a more credible and traceable knowledge-sharing 
platform for the knowledge graph community. 


t Corresponding author: Huajun Chen (Email: huajunsir@zju.edu.cn; ORCID: 0000-0001 -5496-7442). 
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1. OPEN KNOWLEDGE ECOSYSTEM 
1.1 Knowledge Graphs as World Models 


Knowledge such as facts, information or descriptions, is the awareness and understanding of the world. 
Knowledge is acquired through experience or education by perceiving, discovering and learning [1, 2]. The 
early concept of knowledge graph originates from the idea of semantic Web [3, 4] by Tim Berners Lee who 
is honored as inventor of the World Wide Web. It aims at using the structured graph to model the knowledge 
of the world and record the relationships that exist between things in the world [5]. 


Generally speaking, knowledge graphs (KGs) are directed labeled graph (DLG) structures that capture 
knowledge in the form of triplets (subject, predicate, object), expressed as (s,p,o), where s and o denote 
entities, and p establishes the relationship between entities. Due to the convenience of building semantic 
connections between real-world objects and domain knowledge, many large-scale business KGs have 
been built in recent years such as Google and Baidu Knowledge Graph, Microsoft Satori, and Product 
Knowledge Graph from Alibaba and Amazon. These KGs have led to a broad range of important applications, 
such as question answering [6], language understanding [7, 8], relational data analysis [9, 10], and 
recommendation systems [11]. Meanwhile, along with early Al research on knowledge representations such 
as ontologies [12, 13] and followed consolidation of semantic Web standards such as RDF/OWL [14, 15], 
the most recent advances in deep representation learning and graph neural networks [16, 17] have led to 
brand new development of knowledge graph technologies. 


1.2 Open Knowledge Graphs 


Along with the burgeoning of the largest open knowledge sharing media, i.e., the World Wide Web, 
publishing knowledge bases as open data on the Web has gained significant attention since the early days 
of the Web. Typical examples include the Linked Open Data efforts which were initiated by the semantic 
Web community and have already collected over 3,000 public linked data sets, the ConeceptNet [18] 
which originates from a Web-based, crowdsourced Open Mind Common Sense project launched by MIT 
media lab since 1999, and Wikidata [19], which is a free and editable knowledge base set up by Wikipedia 
Foundation. In China, Chinese Information Processing Society of China (CIPS) launched the OpenKG in 
2015 to foster the development and openness of Chinese knowledge graphs. OpenkG has accumulated 
over 200 billion triplets in Chinese since its birth and the size is growing fast. Unlike existing open 
knowledge base programs, OpenKG chain is envisioned as a blockchain-based open knowledge 
infrastructure, which will be introduced in detail in the following sections. 


1.3 Value Chain of Open Knowledge 


Knowledge is a valuable resource. As illustrated in Figure 1, the production, transformation, exchange, 
and consumption of knowledge form the value chain of the knowledge in society. Upon the open Web 
infrastructure, it is a real challenge to build a trust value chain to support the lifecycle of knowledge. 
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Figure 1. Open ecosystem of knowledge graphs. 


1.3.1 Incentive Knowledge at a Triple Granularity 


The backend logic between contribution and incentives in a society can be described vividly. The more 
contributions we make, the more incentives we get. It motivates people to contribute more to the society 
with better quality. According to Maslow’s hierarchy of needs, the incentive is not limited to incomes, but 
better to be measurable. As people share their own knowledge and create social value, it is possible to 
evaluate and have incentives by knowledge directly. The Web, like a giant social media, cannot satisfy the 
requirement to track, evaluate, validate the contribution of sharing knowledge and make incentives. Given 
a knowledge graph, another challenge is to do knowledge-based incentives at a triple granularity. From the 
delivery of factual knowledge in a triple format, every step of its verification, consumption, transmission, 
and deletion is traceable and measurable. The basic requirement of a robust sharing platform is to evaluate 
the value of knowledge in triple form, track its contribution during knowledge processing or consumption, 
and grant proper incentive in the whole lifecycle of the knowledge. 
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1.3.2 Self-sovereign Knowledge 


For now, most knowledge graphs are using centralized systems and data are stored in centralized servers. 
So the contributors cannot make full control of their knowledge, and the ownership of the knowledge goes 
to centralized systems. OpenKG chain uses the term self-sovereign knowledge as the concept of individuals 
or organizations to take full responsibility of their knowledge, make full control of the knowledge, and 
reveal knowledge with privacy protection or copyright. Individuals and organizations with decentralized 
identifiers (in any KG system) can be discovered, located, and share their knowledge triples to each other 
without going through an intermediary, even cross systems. OpenKG chain is trying to provide a self- 
sovereign interlinking KG system with privacy protection and without an intermediary. 


1.3.3 Adversarial Attack and Knowledge Accountability 


In an open media like the Web, everyone can publish statements or contribute/consume knowledge 
equally. It brings forth a salient concern on holding all stakeholders accountable for either the statements 
they make or the actions they take on knowledge. Firstly, statement accountablity ensures the authenticity 
of a statement if everyone is aware that he/she is responsible for what is stated. Secondly, action accountablity 
monitors all activities performed upon knowledge and keeps track of the whole lifecycle of a triple from 
its birth, edits, consumption, and deletion, preventing illegal actions or adversarial attack on the knowledge 
base. For example, someone may add a malicious rumor statement into a knowledge base or illegally delete 
a fact he/she is reluctant to reveal to the public. 


1.3.4 Immutable Knowledge and Tamper Resistance 


Another issue relevant to knowledge accountability is Tamper Resistance. In some cases, the knowledge 
triples are either sensitive or the integrity of the content is critical. Therefore they must be protected from 
being fraudulently and intentionally modified, and a tamper resistance network infrastructure is thus 
required. To ensure that once a statement is committed to the network, both the content and all the 
follow-on transactions made upon the statement cannot be altered or compromised retrospectively. 
The integrity of the knowledge content is guaranteed and the transactions cannot be tampered with by 
any means. 


1.3.5 Lighting-up and Dissemination of Knowledge Value 


The consumption of knowledge is the most direct way to measure the value of knowledge. The more 
knowledge is consumed, the higher is the value of knowledge. Meanwhile, the consumption of knowledge 
triggers the dissemination of the value of knowledge. We call the process of knowledge being consumed 
the lighting-up of knowledge value. The usage scenarios of the knowledge graph support different knowledge 
users to the nodes in the knowledge graph to trigger knowledge spread. 


“Lighting-up by search” refers to that the knowledge users consume knowledge during the search 
process, which triggers the value lighting-up of the searched knowledge item. Knowledge graphs support 
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semantically related search. And further related searches will continue to trigger new knowledge lighting-up. 
Each step of lighting-up records the value generated. Since knowledge comes from different producers, the 
produced value also needs to be awarded to the corresponding knowledge producers on the chain in an 
accountable way. 


“Lighting-up by question and answering” is similar to “Lighting-up by search”. A question issued by 
a user triggers the lighting-up of the knowledge triple touched by the question. Meanwhile the intermediate 
nodes traversed from the starting node to the answer node in the question and answering retrieval process 
will also be further lighted up and value-recorded. 


“Lighting-up by inference” refers to the knowledge lighting-up triggered by the inference process. The 
knowledge in the knowledge graph is usually incomplete, and the reasoning process of the knowledge 
graph is completed based on the existing knowledge in the knowledge graph. At the same time, due to 
many sources of knowledge, the process of lighting-up by inference may also be completed federally, that 
is, lighting-up by federal inference [20, 21, 22]. 


“Lighted up by analysis” refers to the comprehensive analysis of knowledge from different sources to 
continuously trigger the lighting process of related knowledge in the knowledge graph. Similarly, due to 
the diversity of knowledge sources, the analysis process may also be complete in a federated manner. For 
example, an analysis model may be established through federate learning [23, 24]. 


2. WHEN KNOWLEDGE GRAPH MEETS BLOCKCHAIN 


2.1 Blockchain and Distributed Ledger 


Blockchain [25, 26] uses the distributed ledger [27, 28, 29] technology, which is a kind of ledger 
database shared, copied and synchronized inside an open P2P network. The data storage and processing 
is completed by each node inside the P2P network. Therefore, each node can participate in monitoring the 
legitimacy of the transaction and testify for the transaction results. The blockchain constitutes a multi- 
centralized network with consensus [30, 31, 32] on a complete transaction log and execution results, which 
is characterized by “immutable”, “traceability”, and “right confirmation”. Based on these characteristics, 
the blockchain technology has laid a solid “trust” foundation and created a reliable “cooperation” 
mechanism. An open knowledge platform provides service for collective maintenance of knowledge, to 
make full trace of knowledge evolution process, to measure the contribution of knowledge contributors 
quantitatively, and to meet the requirement of data privacy. The development of knowledge graph takes 
into account the knowledge quantification, iterative history tracing, and the governance of knowledge 
points, contributors and knowledge development environment. Therefore, OpenKG blockchain is used to 
satisfy the requirements. 
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2.2 Open Knowledge and Blockchain 


The construction of open-domain knowledge graphs reflects the social attributes of the open community 
which brings forth a variety of challenges: 


e Identify more individual roles and avoid oligopoly of open knowledge. It is required to identify the 
same entity with different roles to participate in collaborative work and clarify the contribution of 
different roles to the open knowledge network. Furthermore, open knowledge contributors manage 
their data autonomously to avoid unauthorized misuse caused by data concentration. 

e Support for more decentralized trust management and more controllable qualification of domain 
experts for open knowledge. The qualification of domain experts in different fields is essential for 
high-quality knowledge crowdsourcing. The levels of qualification recognition need to be adjusted 
dynamically, and in turn, more quantitative and fine-grained quantitative evaluation programs need 
to be implemented. 

e The ability to quantify the contributions of massive participants. It is required to track the value of 
open knowledge contributed by a large number of contributors and adjust the knowledge value model 
based on feedback from massive participants. 


By using the distributed ledger technology, the generation, development, and deduction of open 
knowledge are recorded, the value and ownership of open knowledge can be fully tracked. For instance, 
a multi-centralized blockchain network provides trusted infrastructure, tracks the process of open knowledge 
development, and guarantees data authenticity. A decentralized identity system supports multi-dimensional 
management of distributed data tokens and massive user tokens. The distributed token solution of the 
blockchain supports the calculation of knowledge value points, reflecting the value of open knowledge. 


In summary, the open knowledge graph structured on a decentralized distributed network is bound to 
face many issues, including incentives, ownership management, traceability, trust, and privacy. However, 
the existing centralized knowledge graph management platform does not consider these issues, thus 
discouraging the sharing and interconnection of knowledge, nor can it guarantee the authenticity and 
timeliness of knowledge. So we propose a blockchain-based open knowledge graph platform, the functional 
components of which can be sorted into three levels, including knowledge production, knowledge 
dissemination, and knowledge consumption, as shown in Figures 2 and 3. The knowledge production layer 
corresponds to traditional technologies such as knowledge modeling, extraction, fusion, and verification. 
The knowledge dissemination layer needs to consider the fine-grained knowledge rewarding, self-sovereign 
knowledge management, knowledge accountability, adversarial attack and taper resistance, and data 
privacy protection. The knowledge consumption layer includes semantic search and question answering, 
reasoning, federated learning, and process automation such as robotic process automation (RPA) [33, 34] 
and other series of applications that need to be built on distributed knowledge sources. 
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Figure 2. Federated knowledge graph technology platform architecture. 
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Figure 3. OpenKG resources sharing platform. 


3. OPENKG BLOCKCHAIN 
3.1 OpenKG Resource Model 


OpenkG chain consists of several websites aiming at sharing different types of knowledge graph resources. 
e The OpenKG main site provides a sharing platform for coarse-grained open resources such as KG 
data sets and KG tool sets contributed by the KG community in China. 


e CnSchema provides a crowdsourced open schema for Chinese knowledge graphs. 
e OpenBase is a fine-grained triple-level knowledge graph crowdsourcing platform. 
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At present, OpenKG chain has completed the construction of the underlying blockchain infrastructure, 
as well as the on-chain testing of sharing different types of knowledge resources collected by OpenKG 
community through OpenKG chain. The initial nodes of OpenKG blockchain network are tentatively set to 
be seven, which are delivered and deployed in different universities or corporate institutions for operation. 
These seven nodes are independent of each other and form a multi-center blockchain infrastructure 
for OpenKG community, which builds upon a consensus mechanism to provide a distributed trusted 
infrastructure. More core nodes can be gradually expanded as needed. In this test platform, there are already 
more than 1,000 registered knowledge contributors. The two-month average daily value of the on-chain 
test reaches 10,691 times, and the total number of lights and on-chain deposits exceeds 550,000. It is the 
first test that has realized the knowledge confirmation of entity/triple granularity (Figure 4). 
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Figure 4. Statistics of the number of times OpenKG lighting-up in May, 2020. Note: Txt Count: KG access count; 
Unique Senders: KG point dispatchers; Unique Receivers: Unique KG point receivers; Total Unique: The sum of 
unique senders and unique receivers. 


3.2 OpenkG Value Model 


The first issue that the OpenKG chain needs to address is to define proper value models to reflect the 
value of knowledge. In the case of KG, the value calculation needs to be finely controlled at a triple level. 
The K-Point is proposed to measure knowledge value for triple knowledge published in OpenKG. Secondly, 
since OpenKG chain gathers knowledge in the form of community crowdsourcing [35, 36, 37], we also 
need to design a value model to measure and honor the contribution of knowledge contributors (Table 1). 
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Table 1. Statistics of the number of times OpenKG chain lighting-up in May 2020. 


Curve 


Characteristics 


Knowledge value (unit price) 


Knowledge consumer 


Relevant knowledge points 


Value of knowledge 


Cumulative value of knowledge 


In the process of the value development of a knowledge unit, when a few 
people understand it, the unit value is higher. With more and more acceptance 
and use, the unit value gradually decreases. 

Knowledge is limited by the domain, and the number of people who understand 
gradually increases, and the domain is gradually saturated. The more knowledge 
audiences, the more knowledge use. 

As knowledge is accepted, it will reason or discover the relationship with other 
knowledge and form new knowledge. The more relevant knowledge points, the 
more knowledge will be used. 

The number of uses of knowledge and the unit price of knowledge form the 
value of knowledge. 

Because of the consistency of knowledge, knowledge has cumulative value. 


According to Maslow’s hierarchy of needs, the incentive to contribute is not limited to incomes, but 


better to be measurable. A knowledge value point is proposed to measure the value from a knowledge 


perspective, OpenKG Token is proposed to measure the contributions and honor the contributors. The 


hypothesis of the knowledge value model from the very beginning is described as the following, and the 


knowledge value model can be demonstrated in Figure 5. 
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Figure 5. Knowledge value model. 
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3.2.1 K-Point: Knowledge Value Measurement 


OpenKG chain has designed the K-Point contract to reflect the value of knowledge. The assessment of 
knowledge value is based on a simple model, e.g., each time the knowledge is used, the K-Point is increased 
accordingly. In current settings, a simple chi-square distribution is used for fitting the model as illustrated 
below. As knowledge usage scenarios increase, OpenKG chain will continue to use some learnable 
algorithms to calibrate and optimize the value evaluation models. 


f(x,v) = Gammaz.dist(x,v / 2, 2) = ae (1) 


Without considering the interrelationship of knowledge applications, let: 


fenowledge value expectation (K, x) SS f(x, 2) 
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= Sage value expectation 

Let x e (0, 10], COUNtknowiedge use in) be the number of knowledge uses on the nth day, and the value period 
of the knowledge point is t (days), then the unit price of knowledge usage on the nth day is: 
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Knowledge use expectation 
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f, 


Knowledge unit price 


(K,n) = 


Each time the knowledge is used, the K-Points are weighted according to the unit price of knowledge 
usage. 


3.2.2 OpenKG Token: Honor Point Measurement 


OpenkG chain designed the OpenKG-Token contract to honor the knowledge contributors (publishers, 
reviewers, and modifiers). The OpenKG-Token is dynamically calculated and distributed to the knowledge 
contributors when the knowledge is used. The more knowledge is used, the more points are rewarded for 
its contributor. In the initial situation, the value will be equally distributed to the knowledge contributors. 


f = epes unit price 5 
Single honor value ~~ Count ( ) 


contributor 
The total OpenKG Token satisfies the following relationship: 
frotathonor value T gK) Y fenowledge value (6) 


In the initial situation, g(k) = 1. 
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3.3 Decentralized Identity Management in OpenKG Chain 


The key of OpenKG chain is a trusted infrastructure. OpenKG chain adopts the VBFT consensus algorithm. 
Based on the traditional BFT (Byzantine Fault Tolerance) algorithm, the “VRF (Verifiable Random Function)” 
is introduced, which improves the anti-attack ability of the consensus algorithm and at the same time 
increases the consensus speed. Ontology Network uses the WasmJIT technology as the smart contract 
execution environment. Meanwhile it provides Layer 2 technologies to balance on-chain business 
performance and blockchain network expansion solutions (Figure 6). 


OpenBase crowdsourcing | Knowledge applications 


OpenKG tool set Data Interoperability Protocol 
OpenKG data set DDXF 

(Distributed Data/Resource Exchange Framework) 
OpenKG Schema 


ONT ID (Decentralized Identity Framework) 


Ontology/OpenKG Network DG ONTology 


Figure 6. OpenKG chain layered architecture. 


At the business application level, OpenKG chain proposes the decentralized identity identification 
protocol (ONT ID) for identity management in the whole lifecycle of OpenKG chain including K-Points 
calculation, resource management, and contributor’s identification. The distributed data exchange framework 
(DDXF) manages and tracks the whole process of knowledge construction, dissemination, and consumption 
with cross-system interoperability protocols. ONT ID can issue verifiable credentials for identifying entities, 
verifying credentials, supporting multi-dimensional authentication, and accessing different trusted sources. 
Distributed identity identification and multi-dimensional verifiable credentials provide a credible account 
system and risk control model for different use scenarios of knowledge. 


3.4 Data Right Management in OpenKG Chain 


The construction and use of OpenKG chain’s data involve multiple rights such as knowledge ownership, 
sorting, processing, viewing, and downloading. A salient challenge here is to support the right management 
at a different level of granularities of knowledge such as data sets, entities, and triplets. The OpenKG chain 
uses a distributed identity and token scheme to provide fine-grained authority management for multiple 
types of knowledge resources. 
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Firstly, OpenKG chain’s data including even a single triple hold ONT ID, and the identification of data 
in different systems is unique. Further, for different knowledge usage scenarios, the knowledge owners and 
contributors can actively create knowledge authority tokens that are performed completely on the chain 
and guarantee safety and reliability in the whole process of token usage. Meanwhile, all OpenKG chain 
users also hold ONT ID, which can identify the same user in different knowledge usage scenarios of 
different systems and can trace back to knowledge contributors across systems to ensure the traceability of 
all operations. As shown in Figure 7, the specific implementation details are summarized as below: 


e Data and user entities have ONT ID. 

e For different scenarios, the addition, deletion, modification, and checking operations upon knowledge 
are managed through off-chain tokens. 

e Each off-chain authority token corresponds to an on-chain data token, namely: OpenKG data-token. 

e We use the property relationship between data-token and ONT ID on the chain to confirm cross- 
system token rights. 

e Operational authentication is performed through the binding relationship between on-chain data- 
token and off-chain system tokens. 
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Figure 7. Data right management model of OpenKG chain. 


3.5 Trust Management in OpenKG Chain 


Why do contributors and users trust the knowledge published in OpenKG chain? For trust management, 
OpenkKG chain provides credibility metrics for published knowledge from three levels (Figure 8): 


e Infrastructure Level. The underlying network scale and node distribution of the OpenKG blockchain 
provides the basic endorsement of credibility of published knowledge. It will be more difficult to 
cheat on a network with more decentralized nodes. 
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e Knowledge Management Level. As all operations on knowledge in OpenKG chain are recorded on 
the chain, which is tamper-proof and traceable, it can provide trust endorsement for the authenticity 
and consistency of the data. 

e Knowledge Contributor and User Level. Since all behaviors of contributors and users are also recorded 
and traceable on the chain, the analysis of contributors’ or users’ behavior can be used as a credible 
endorsement. It is worth mentioning that the blockchain cannot identify malicious data, but it can 
provide proof of malicious behavior outside the system and is permanently valid, which in turn affects 
the behaviors of contributors or users. 
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Figure 8. OpenKG chain model architecture. 
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4. IMPLEMENTATION AND EVALUATION 
4.1 General Implementation 


As introduced in the previous section, the initial nodes of the OpenKG blockchain network are tentatively 
set to seven, which are delivered to different institutes for operation. They form a multi-centered trusted 
network infrastructure upon which knowledge contributors can share data and normal users can retrieve 
knowledge in a safer and more accountable mode. Due to performance consideration, only operations on 
knowledge are recorded on chain. To synchronize on-chain operation records and off-chain knowledge, 
OpenkG chain implements a tokenized contract to solve the problem of data entity identification of off- 
chain knowledge. The whole process of using knowledge tokens on the chain is recorded to ensure the 
integrity of the operations while ensuring traceability. Additionally, OpenKG chain supports knowledge 
contributors to independently manage their knowledge data and also enables multi-party knowledge 
collaboration under the premise of knowledge privacy protection. In summary, the implementation of 
OpenkG blockchain enables the following new functionalities: 


e Knowledge Index and Resource Synchronization. The on-chain records provide an index to off-chain 
knowledge which is stored distributively and synchronized through the blockchain. 

e Safe Knowledge Consumption. Any types of knowledge consumption including browsing, downloading, 
and learning are recorded on the chain, ensuring safer usage and knowledge exchange. 

e Accountable Knowledge Processing. Any types of operation on the knowledge including addition, 
audit, modification, and abolition are recorded on the chain, ensuring more accountable knowledge 
management. 

e Knowledge Traceability. Since any operations upon knowledge are recorded on the chain, we could 
trace the changelog of even a single triple based on the history of the alliance chain. 


4.2 OpenKG.CN on Chain 
4.2.1 Introduction to OpenKG.CN 


OpenKG.CN is the main website that provides a unified sharing platform for different types of open 
resources. Currently it supports the sharing of open KG data and open tools. Users can freely contribute 
and download various types of resources on this platform. The OpenKG.CN platform currently supports 
three blockchain operations: user registration, resource registration and resource download. As seen in 
Figure 9, we build a visualization website for OpenKG community users to check their OpenKG tokens. 
We also make it possible to see the value of resources in Figure 10, which will be updated in real-time. 
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OpenkG on Chain 


REAL TIME OPENKG-TOKEN RANKINGS(TOP10) 


Name Email Ranking 
admin chen@murren.cn 1 
wey960221-7299 wey960221@foxmail.com 2 
shawyh-2857 shawyh@fudan.edu.cn 3 
fzjacky frjacky@mailecustedu.cn 4 
83664721-5428 83564721@qq.com 5 
huajunsir chen@murren.cn 6 
datahorizon huanyong@datahorizon.cn 7 
pkupie fengyansong@pku.edu.cn D 
chenyang yang.chen@yiductoud.cn a 
mdzhang mdzhangmd@gmail.com 10 


Figure 9. OpenkG-token rankings at OpenKG.CN. 
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Figure 10. K-point rankings at OpenKG.CN. 
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4.2.2 Resource Registration on Chain 


e User registration on the chain. When a user registers at OpenKG.CN, the system will automatically 
complete the registration of user information on the blockchain server and generate an on-chain 
account, i.e., the ONT ID as a surrogate of the user on the chain. 

e Resource registration on the chain. Users can upload resources to the platform after they have 
registered themselves at OpenKG.CN. For each resource, the system will automatically generate a 
resource ID, i.e., the ONT ID for data or tools, and register the resource on the blockchain server. 
Please note that there is no OpenKG Token generated at this time since that value can only be 
generated when the knowledge is consumed. 


4.2.3 Resource Value Lighting-up 


Since OpenKG.CN only provides coarse-grained knowledge sharing at the level of data sector tools, the 
main type of resource consumption and resource value lighting-up is implemented through downloading. 
When the resource is downloaded and used by other users, the system will generate the corresponding 
OpenkG Token according to the resource ID and assign it to the account of the resource contributors. 


4.3 OpenBase on Chain 
4.3.1 Introduction to OpenBase 


OpenBase is a crowdsourcing platform that enables fine-grained triple-level knowledge sharing within 
OpenkG community. The whole process includes triple addition, error checking, and knowledge graph 
reviewing. OpenBase takes into account the construction cost and speed of knowledge graphs at the same 
time. To solve the problems of fine-grained crowdsourcing construction and error checking and completion 
of knowledge graphs, OpenBase can take into account the construction cost and speed of knowledge 
graphs at the same time. It is constructed by machines and reviewed and modified by people. Targeted the 
existing knowledge graph, OpenBase builds a unified crowdsourcing platform for crowdsourcers to 
implement tasks like error checking and review of the knowledge graph. 


4.3.2 Fine-grained Knowledge on Chain 


Traditional knowledge graph crowdsourcing platforms cannot completely solve the problem of mutual 
trust among users. Inspired by the idea of blockchain, we build OpenBase upon a trusted chain network 
to enable trusted knowledge sharing at a fine-grained triple level. Figure 11 illustrates the whole procedure 
of crowdsourcing triples on chain. User operations such as adding a new triple, reviewing knowledge 
contributed by other users, searching or querying knowledge, or downloading the entire data set will all 
generate related OpenKG Tokens. And all these types of user operations on data will be recorded on the 
blockchain for future enquiring. As to user management, when a user registers himself at OpenBase, it will 
be associated with an ONT ID which is decentralized managed on the underlying blockchain. Any 
operations issued by the user will be also associated with corresponding data and recorded on the chain. 


220 Data Intelligence 


202211.00322v1 


chinaXiv 


ChinaXiv ERAF 
OpenKG Chain: A Blockchain Infrastructure for Open Knowledge Graphs 


For reward management, as per current setting, data access will not reward visitors with OpenKG Tokens, 
but only reward those who contribute to the initial data. The OpenKG Token will also be generated when 
data are reviewed or checked by reviewers. The OpenKG Token will be copied and distributed among 
multiple copies, which are equally divided among multiple reviewers and the original contributor. In these 
cases, the owner of the data is still the original user who uploaded the data. However, if an edit operation 
is issued, i.e., when a user modifies and edits the data, the user and the original contributor will become 
the owner of the data. Accessing the data (search, QA, etc.) will be regarded as a lighting-up operation, 
which will generate honor points to the contributors of the data set. When editing data, the editor will share 
ownership of the data with the original contributor. Downloading the data set will also generate OpenKG 
Tokens, which are divided among data contributors. The operation of adding entities and attributes will be 
regarded as the registration process of new data, and the operator will become the owner of the new data. 


Honor Value -> Reviewer, Acceptor, Contributor C) Interface 


ee tn Rn aan Maen nn SIS T Cl Main system 


Data upload Honor Value ->Contributor `^ i 
A Dasa access ) 


User: z s Data editing 
registration SCH 


- Í 
Front-end Dataset | 2—2 Ped 
data set download |} VUV 


Honor Value -> Contributor 


Figure 11. Illustration of OpenBase blockchain architecture. 


4.3.3 Lighting-up Fine-grained Knowledge 


For OpenBase, there are several ways of triggering the lighting-up of the knowledge value. 


e Data search and QA. When users search and query the data, the corresponding knowledge will be 
lighting-up and a certain amount of OpenKG tokens will be generated. 

e Data downloading. When users download a data set, a certain amount of OpenKG tokens will also 
be generated to reward data contributors. 

e Data review and checking. When users review and check data, all relevant reviewers or contributors 
will be rewarded with a certain amount of honor. 
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5. CONCLUSION AND FUTURE WORK 


Knowledge is a valuable resource, and linking knowledge can further increase the value of knowledge. 
The production, transformation, exchanging, and consumption of knowledge form the value chain of 
knowledge in society. The value network of knowledge graph includes not only the contributors of knowledge 
but also the users of knowledge. The whole process of knowledge construction and consumption will 
gradually enrich the knowledge network and increase the knowledge value essentially. This process puts 
forward new requirements for rewarding knowledge at a triple granularity, self-sovereign knowledge, 
adversarial attack and knowledge accountability, etc. It is challenging to build a trustable value chain for 
the whole lifecycle of knowledge upon the open Web infrastructure. 


OpenkG chain makes attempts at addressing these challenges based on the state-of-the-art blockchain 
technology. We hope to provide a valuable reference for the communities to help build their own enterprise- 
level knowledge graph crowdsourcing platform. Although the blockchain technology provides new solutions 
for some of the problems mentioned above, they are still not capable of solving all the problems. We still 
face many challenges, such as performance issues caused by fine-grained knowledge identification on the 
chain, decentralized storage of knowledge graphs, and trainable incentive models for knowledge 
crowdsourcing. 


The framework of blockchain-based OpenKG chain provides a technical solution to manage the lifecycle 
of knowledge, as well as the process of its value discovery, standing on the perspective of knowledge itself. 
Furthermore, the process to form knowledge faithfully reflects the self-sovereign of user data. By bridging 
the physical identity with digital identity, expanding the concept of knowledge to common valuable 
information from Web pages, it is possible to form an Internet “twin” social network. The formation will 
provide effective experimental support by then. Currently, OpenKG chain implements the on-chain test of 
data sets, tool sets, and knowledge in the form of triplets. The methods of knowledge lighting-up are only 
limited to downloading and searching. In the future, we will try more diverse types of resources, including 
KG schemas, bots, and knowledge graph algorithms. We will also explore richer modes of knowledge 
lighting-up such as question answering, decentralized reasoning, and federated knowledge learning. 
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